Skip to content

perf: Optimize substr_index to use bulk-NULL string builder#21877

Merged
martin-g merged 7 commits intoapache:mainfrom
neilconway:neilc/perf-builder-substr-index
Apr 28, 2026
Merged

perf: Optimize substr_index to use bulk-NULL string builder#21877
martin-g merged 7 commits intoapache:mainfrom
neilconway:neilc/perf-builder-substr-index

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

@neilconway neilconway commented Apr 27, 2026

Which issue does this PR close?

Rationale for this change

As with other recent optimizations, we can optimize NULL handling in substr_index by using the new bulk-NULL string builders.

Benchmarks:

Utf8

  • utf8_100_array_long_delimiter: 10.0 µs → 10.1 µs (+1.00%)
  • utf8_100_array_single_delimiter: 2.9 µs → 2.5 µs (−13.79%)
  • utf8_100_scalar_long_delimiter_neg: 4.1 µs → 3.5 µs (−14.63%)
  • utf8_100_scalar_long_delimiter_pos: 2.9 µs → 2.7 µs (−6.90%)
  • utf8_100_scalar_single_delimiter_neg: 2.2 µs → 1.993 µs (−9.41%)
  • utf8_100_scalar_single_delimiter_pos: 2.1 µs → 1.845 µs (−12.13%)
  • utf8_1000_array_long_delimiter: 101.0 µs → 101.1 µs (+0.10%)
  • utf8_1000_array_single_delimiter: 36.8 µs → 31.7 µs (−13.86%)
  • utf8_1000_scalar_long_delimiter_neg: 38.9 µs → 36.9 µs (−5.14%)
  • utf8_1000_scalar_long_delimiter_pos: 25.1 µs → 23.3 µs (−7.17%)
  • utf8_1000_scalar_single_delimiter_neg: 19.3 µs → 17.7 µs (−8.29%)
  • utf8_1000_scalar_single_delimiter_pos: 18.2 µs → 16.6 µs (−8.79%)
  • utf8_10000_array_long_delimiter: 1083.4 µs → 1038.2 µs (−4.17%)
  • utf8_10000_array_single_delimiter: 461.8 µs → 414.7 µs (−10.20%)
  • utf8_10000_scalar_long_delimiter_neg: 392.4 µs → 379.3 µs (−3.34%)
  • utf8_10000_scalar_long_delimiter_pos: 246.5 µs → 227.4 µs (−7.75%)
  • utf8_10000_scalar_single_delimiter_neg: 191.3 µs → 177.5 µs (−7.21%)
  • utf8_10000_scalar_single_delimiter_pos: 179.4 µs → 168.8 µs (−5.91%)

Utf8View

  • utf8view_100_array_long_delimiter: 9.5 µs → 9.8 µs (+3.16%)
  • utf8view_100_array_single_delimiter: 2.6 µs → 2.6 µs (0.00%)
  • utf8view_100_scalar_long_delimiter_neg: 4.0 µs → 4.0 µs (0.00%)
  • utf8view_100_scalar_long_delimiter_pos: 2.8 µs → 2.8 µs (0.00%)
  • utf8view_100_scalar_single_delimiter_neg: 2.3 µs → 2.3 µs (0.00%)
  • utf8view_100_scalar_single_delimiter_pos: 2.2 µs → 2.1 µs (−4.55%)
  • utf8view_1000_array_long_delimiter: 94.8 µs → 99.2 µs (+4.64%)
  • utf8view_1000_array_single_delimiter: 31.5 µs → 32.0 µs (+1.59%)
  • utf8view_1000_scalar_long_delimiter_neg: 38.7 µs → 39.0 µs (+0.78%)
  • utf8view_1000_scalar_long_delimiter_pos: 25.4 µs → 25.4 µs (0.00%)
  • utf8view_1000_scalar_single_delimiter_neg: 21.4 µs → 21.8 µs (+1.87%)
  • utf8view_1000_scalar_single_delimiter_pos: 20.8 µs → 20.9 µs (+0.48%)
  • utf8view_10000_array_long_delimiter: 998.4 µs → 1025.4 µs (+2.70%)
  • utf8view_10000_array_single_delimiter: 414.9 µs → 415.7 µs (+0.19%)
  • utf8view_10000_scalar_long_delimiter_neg: 393.7 µs → 395.9 µs (+0.56%)
  • utf8view_10000_scalar_long_delimiter_pos: 253.4 µs → 252.7 µs (−0.28%)
  • utf8view_10000_scalar_single_delimiter_neg: 214.5 µs → 217.3 µs (+1.31%)
  • utf8view_10000_scalar_single_delimiter_pos: 207.9 µs → 208.7 µs (+0.38%)

This PR doesn't touch the Utf8View code path, so the Utf8View regressions above are likely measurement noise.

What changes are included in this PR?

  • Optimize substr_index by switching from Arrow string builders to bulk-NULL string builders

Are these changes tested?

Yes, covered by existing tests.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the functions Changes to functions implementation label Apr 27, 2026
_ => builder.append_null(),
let num_rows = string_array.len();
// Output is null IFF any input is null.
let nulls = NullBuffer::union(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn’t this highly dependent on heuristics—whether the cost of performing the bitmap union is actually justified by the append_nulls operations it helps us avoid?

Especially since the other approach of 1 by 1 scanning rows is also sequential IO ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I am not experienced enough to comment on the BitMap union performance of Arrow

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, @RatulDawar !

Empirically, taking this approach appears to be a significant performance win (although in some cases the overhead of NULL handling might not be huge to begin with, depending on the UDF). Intuitively, it makes sense that this would be faster:

  • Computing the union of two bitmaps can be accelerated with SIMD, and the Arrow primitives are well-optimized. It's also an entirely sequential memory access pattern.
  • append_value in the bulk-NULL builder does not need to touch the NULL bitmap for every row, whereas in the Arrow builder it does. That means fewer per-row branches and also less data cache pressure.
  • Similarly, we can now just check a single bit to look for NULLs, rather than doing three conditionals. It's also easy for the compiler to hoist the is-the-NULL-bitmap-None check outside of the loop and elide it entirely (I haven't checked if LLVM is actually doing this but it could).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • in the bulk-NULL builder does not need to touch the NULL bitmap for every row, whereas in the Arrow builder it does. That means fewer per-row branches and also less data cache pressure.

Makes sense, I can go ahead and draw a conclusion from this that in many cases, we can isolate simpler / coupled operations for better cache locality and SIMD.
Thanks for the explanation here @neilconway

@martin-g martin-g added this pull request to the merge queue Apr 28, 2026
@martin-g
Copy link
Copy Markdown
Member

Thank you, @neilconway , @Jefffrey & @RatulDawar !

Merged via the queue into apache:main with commit ec92925 Apr 28, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize substr_index to use bulk-NULL string builder

4 participants